Feature Selection for Highly Skewed Sentiment Analysis Tasks

نویسندگان

  • Can Liu
  • Sandra Kübler
  • Ning Yu
چکیده

Sentiment analysis generally uses large feature sets based on a bag-of-words approach, which results in a situation where individual features are not very informative. In addition, many data sets tend to be heavily skewed. We approach this combination of challenges by investigating feature selection in order to reduce the large number of features to those that are discriminative. We examine the performance of five feature selection methods on two sentiment analysis data sets from different domains, each with different ratios of class imbalance. Our finding shows that feature selection is capable of improving the classification accuracy only in balanced or slightly skewed situations. However, it is difficult to mitigate high skewing ratios. We also conclude that there does not exist a single method that performs best across data sets and skewing ratios. However we found that TF ∗ IDF2 can help in identifying the minority class even in highly imbalanced cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Sentiment Analyzer: A Framework based on a Novel Feature Selection Method

In the recent decade, with the enormous growth of digital content in internet and databases, sentiment analysis has received more and more attention between information retrieval and natural language processing researchers. Sentiment analysis aims to use automated tools to detect subjective information from reviews. One of the main challenges in sentiment analysis is feature selection. Feature ...

متن کامل

Impact of Feature Selection Techniques for Tweet Sentiment Classification

Sentiment analysis of tweets is a powerful application of mining social media sites that can be used for a variety of social sensing tasks. Common feature engineering techniques frequently result in a large numbers of features being generated to represent tweets. Many of these features may degrade classifier performance and increasing computational cost. Feature selection techniques can be used...

متن کامل

Feature Selection Using Multi-objective Optimization for Aspect Based Sentiment Analysis

In this paper, we propose a system for aspect-based sentiment analysis (ABSA) by incorporating the concepts of multi-objective optimization (MOO), distributional thesaurus (DT) and unsupervised lexical induction. The task can be thought of as a sequence of processes such as aspect term extraction, opinion target expression identification and sentiment classification. We use MOO for selecting th...

متن کامل

Sentiment Analysis with Deeply Learned Distributed Representations of Variable Length Texts

Learning good semantic vector representations for phrases, sentences and paragraphs is a challenging and ongoing area of research in natural language processing and understanding. In this project, we survey and implement several deeplearning and deep-learning-inspired approaches and evaluate these algorithms on several sentiment-labeled datasets and analysis tasks. In doing so, we demonstrate n...

متن کامل

Feature Extraction and Efficiency Comparison Using Dimension Reduction Methods in Sentiment Analysis Context

Nowadays, users can share their ideas and opinions with widespread access to the Internet and especially social networks. On the other hand, the analysis of people's feelings and ideas can play a significant role in the decision making of organizations and producers. Hence, sentiment analysis or opinion mining is an important field in natural language processing. One of the most common ways to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014